Working Women: A study on female
participation in the labor force around the world
a byline that describes my motivation
Some of the research questions I will explore include:
How do female participation rates vary from country to country?
What variables in the data set correlate with female participation rates?
Do other variables, such as life expectancy and region, relate to each other?
The data used in this analysis is from The World Bank. I used the gender section to find variables related to gender differences in working levels.
There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.
Country: the country of the observation
-Not all countries are represented, and some have more data than
others
Year: the year of the observation
-The numerical variables female & male life expectancy and fertility
rate have data for many countries back to 1960.
-The variables female & male participation rate and female
percentage of the labor force have data starting at 1990.
Region: the region of the country
-There are 7 regions
Income Level: the income level of the
country
-There are 4 income levels
-According to the World Bank, “the classifications are updated each year
on July 1 and are based on the GNI (Gross National Income) per capita of
the previous year.” More about the income classification can be found here.
Male Life Expectancy: life expectancy at birth, male (years)
Female Life Expectancy: life expectancy at birth, female (years)
Fertility Rate: Number of children born per woman on average (births per woman)
Female Labor: Female labor force as a proportion
of the total labor force (percentage)
-Shows how active women are in relation to others in the labor force
-The labor force is made up of people 15 or older that supply
labor
Female Participation: Rate of women ages 15 or older that supply labor (percentage)
Male Participation: Rate of men ages 15 or older that supply labor (percentage)
In both the summary statistics and correlation tabs, only data from 2020 will be used.
Summary Statistics
The summary statistics tab shows information about each of the variables in the data set.
The number of countries in each region and income group are shown at the top.
The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.
Female life expectancy is higher on average than male life expectancy
The male participation rate tends to be higher than the female participation rate
Both the female percentage of labor force and the female participation rate have a large amount of variation in the data
Correlation Plot
The correlation plot shows relationships between the numerical variables in the data set.
Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.
Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.
Female participation and female labor are very strongly positively correlated as well. This mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.
Female participation is not heavily correlated with any of the other values in the data set.
In the next few tabs, I will explore the relationship between female participation and region and income.
Categorical
Variables
Region Income Group
East Asia & Pacific :37 Low income :28
Europe & Central Asia :58 Lower middle income:54
Latin America & Caribbean :42 Upper middle income:54
Middle East & North Africa:21 High income :80
North America : 3 NA's : 1
South Asia : 8
Sub-Saharan Africa :48
Numerical Variables
| Variable | Min | Mean | Max | Missing Values (%) |
|---|---|---|---|---|
| Male Life Expectancy | 51.45 | 70.57 | 82.9 | 8.29 |
| Female Life Expectancy | 55.88 | 75.47 | 88 | 8.29 |
| Fertility Rate | 0.84 | 2.57 | 6.74 | 7.83 |
| Female Percentage of Labor Force | 8.27 | 41.17 | 54.91 | 13.82 |
| Male Participation Rate | 44.24 | 69.2 | 95.44 | 13.82 |
| Female Participation Rate | 6.08 | 49.69 | 83.05 | 13.82 |
The table below shows a glimpse of the data and fields from
2020, the most recent complete year of reporting. Throughout the
dashboard, this will be the year I focus on.
---
title: "Working Women"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
source_code: embed
theme:
bootswatch: zephyr
---
```{r setup, include=FALSE}
library(flexdashboard)
```
```{r imports}
setwd("C:/Users/clari/Documents/School/Classes/MTH 209/final project")
library(pacman)
p_load(tidyverse, ggplot2, RColorBrewer, DataExplorer, vtable, scales)
gender <- read_csv("data/gender.csv", skip = 4)
colnames(gender) <- mapply(gsub, 'X', '', colnames(gender), USE.NAMES = FALSE)
gender <- gender %>% rename(country_code = "Country Code", country_name = "Country Name", ind_code = "Indicator Code", ind_name = "Indicator Name")
region_income <- read_csv("data/region_income_level.csv")
region_income <- region_income %>% rename(country_code = "Country Code", region = "Region", income_group = "IncomeGroup") %>%
select(country_code, region, income_group)
region_income <- region_income %>% subset(!is.na(country_code)) %>% subset(nchar(country_code) == 3)
indicator_names = c("m_life_exp","f_life_exp", "fertility_rate", "female_labor", "male_participation", "female_participation")
###Trying for all countries - need to make function
df <- gender %>% mutate(indicator = case_when(
ind_code == "SP.DYN.LE00.MA.IN" ~ indicator_names[1],
ind_code == "SP.DYN.LE00.FE.IN" ~ indicator_names[2],
ind_code == "SP.DYN.TFRT.IN" ~ indicator_names[3],
ind_code == "SL.TLF.TOTL.FE.ZS" ~ indicator_names[4],
ind_code == "SL.TLF.CACT.MA.ZS" ~ indicator_names[5],
ind_code == "SL.TLF.CACT.FE.ZS" ~ indicator_names[6]
))
df <- subset(df, !is.na(indicator))
df <- df %>% select(-c(ind_name, ind_code)) %>% select("indicator", "country_name", "country_code", everything())
df <- data.frame(country = rep(unique(df$country_name), 62),
country_code = rep(unique(df$country_code), 62),
year = rep(1960:2021, each = length(unique(df$country_name))),
m_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[1], 4:65]))),
f_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[2], 4:65]))),
fertility_rate = unname(unlist(as.vector(df[df$indicator==indicator_names[3], 4:65]))),
female_labor = unname(unlist(as.vector(df[df$indicator==indicator_names[4], 4:65]))),
male_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[5], 4:65]))),
female_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[6], 4:65])))
)
df <- df %>% left_join(region_income, by = "country_code") %>%
select(country, year, country_code, region, income_group, everything())
df <- df %>% mutate_if(is.character, as.factor)
df$income_group <- factor(df$income_group, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))
```
Data Introduction
=======================================================================
Column {.tabset data-width=600 .tabset-fade}
-----------------------------------------------------------------------
### Motivation and Background
<font size="5"> **Working Women: A study on female participation in the labor force around the world**</font>
a byline that describes my motivation
Some of the research questions I will explore include:
- How do female participation rates vary from country to country?
- What variables in the data set correlate with female participation rates?
- Do other variables, such as life expectancy and region, relate to each other?
The data used in this analysis is from [The World Bank](https://genderdata.worldbank.org/). I used the gender section to find variables related to gender differences in working levels.
### Variable Explanations
There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.
- **Country**: the country of the observation
-Not all countries are represented, and some have more data than others
- **Year**: the year of the observation
-The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.
-The variables female & male participation rate and female percentage of the labor force have data starting at 1990.
- **Region**: the region of the country
-There are 7 regions
- **Income Level**: the income level of the country
-There are 4 income levels
-According to the World Bank, "the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year." More about the income classification can be found [here](https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023#).
- **Male Life Expectancy**: life expectancy at birth, male (years)
- **Female Life Expectancy**: life expectancy at birth, female (years)
- **Fertility Rate**: Number of children born per woman on average (births per woman)
- **Female Labor**: Female labor force as a proportion of the total labor force (percentage)
-Shows how active women are in relation to others in the labor force
-The labor force is made up of people 15 or older that supply labor
- **Female Participation**: Rate of women ages 15 or older that supply labor (percentage)
- **Male Participation**: Rate of men ages 15 or older that supply labor (percentage)
### Analysis
In both the summary statistics and correlation tabs, only data from 2020 will be used.
**Summary Statistics**
The summary statistics tab shows information about each of the variables in the data set.
The number of countries in each region and income group are shown at the top.
The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.
- Female life expectancy is higher on average than male life expectancy
- The male participation rate tends to be higher than the female participation rate
- Both the female percentage of labor force and the female participation rate have a large amount of variation in the data
**Correlation Plot**
The correlation plot shows relationships between the numerical variables in the data set.
- Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.
- Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.
- Female participation and female labor are very strongly positively correlated as well. This mathematically makes sense because as the proportion of women working increases, so should the female percentage of the labor force.
- Female participation is not heavily correlated with any of the other values in the data set.
In the next few tabs, I will explore the relationship between female participation and region and income.
Column {.tabset data-width=400}
-----------------------------------------------------------------------
### Summary Statistics
<br>
<span style="color: light grey;">Categorical Variables</span>
``` {r summary_cat}
data_2020 <- df %>% subset(year == "2020") %>% select(-c("year")) %>% subset(!is.na(region))
region_income_table <- summary(data_2020 %>% select(region, income_group))
colnames(region_income_table) <- c("Region", "Income Group")
region_income_table
```
<span style="color: light grey;">Numerical Variables</span>
``` {r summary_num}
labs <- c('Male Life Expectancy',
'Female Life Expectancy',
'Fertility Rate',
'Female Percentage of Labor Force',
'Male Participation Rate',
'Female Participation Rate')
st(data_2020 %>% select(-c("region", "income_group", "country", "country_code")),
summ=c('min(x)',
'mean(x)',
'max(x)',
'propNA(x)*100'),
summ.names = c('Min',
'Mean',
'Max',
'Missing Values (%)'),
title = "",
digits = 2,
labels = labs)
```
### Correlation
``` {r correlation}
corr <- data_2020 %>% select(-c("region", "income_group", "country", "country_code"))
plot_correlation(corr, cor_args = list("use" = "complete.obs"))
```
Current Exploration
=======================================================================
Column {.tabset}
----------------------------------------------------------------------
### Data Table
<br>
The table below shows a glimpse of the data and fields from 2020, the most recent complete year of reporting. Throughout the dashboard, this will be the year I focus on.
<br>
``` {r view}
DT::datatable(df %>% filter(year == "2020", !is.na(region))) %>%
DT::formatRound(columns=c("female_labor", "male_participation", "female_participation"), digits=3)
```
### Female Particpation Map
<p align="left"><iframe src="https://public.tableau.com/views/maps_16696760316400/MapbyFParticipation?:language=en-US&:display_count=n&:origin=viz_share_link&:showVizHome=no&:embed=true" width="600" height="400"></iframe></p>
Female Participation
=======================================================================